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ESTIMATING THE NUMBER OF NEURONS IN 
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Dedicated to Charles M. Stein on his ninetieth birthday 

A common way of studying the relationship between neural ac- 
tivity and behavior is through the analysis of neuronal spike trains 
that are recorded using one or more electrodes implanted in the brain. 
Each spike train typically contains spikes generated by multiple neu- 
rons. A natural question that arises is "what is the number of neu- 
rons v generating the spike train?" This article proposes a method- 
of-moments technique for estimating v. This technique estimates the 
noise nonparametrically using data from the silent region of the spike 
train and it applies to isolated spikes with a possibly small, but non- 
negligible, presence of overlapping spikes. Conditions are established 
in which the resulting estimator for v is shown to be strongly consis- 
tent. To gauge its finite sample performance, the technique is applied 
to simulated spike trains as well as to actual neuronal spike train 
data. 

1. Introduction. In the field of neuroscience, it is generally acknowledged 
that neurons are the basic units of information processing in the brain. They 
play this role by generating highly peaked electric action potentials or, more 
simply, spikes [cf. Brillinger (1988), Dayan and Abbott (2001)]. A sequence 
of such spikes over time is called a spike train. A typical method of recording 
spike trains is by inserting electrodes into the brain. In the analysis of a spike 
train, Brown, Kass and Mitra (2004) note three goals: (i) identify each spike 
as "signal" (versus pure noise) , (ii) determine the number of neurons being 
recorded, and (iii) assign each spike to the neuron (s) that produced it. (i), (ii) 
and (iii) are collectively termed spike sorting in the neuroscience literature, 
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and, as remarked by Brown, Kass and Mitra (2004), are mandatory for 
all multi-neuronal spike train data analyses. The accuracy of spike sorting 
critically affects the accuracy of all subsequent analyses. For spike sorting to 
be well defined, it is generally assumed that each neuron generates a spike 
with a characteristic voltage waveshape (apart from noise) and that distinct 
neurons have distinct spike waveshapes. For example, Figure 3 in Section 4 
presents the different spike waveshapes of 5 neurons that were estimated 
from real data. 

This article assumes that (i) has been achieved, that is, each spike in 
the spike train has been identified. The objective here is to estimate the 
number of neurons that produced the spike train using the data from (i). This 
problem has the following feature that makes it a nonstandard clustering 
problem. Most often, a spike is produced by one (and only one) neuron, as 
no other neuron spikes around that time. Such a spike is called an isolated 
spike. On the other hand, there may be instances where two or more neurons 
spike in synchrony, that is, at almost the same time [cf. Lewicki (1998), page 
R68]. Then the resulting spike is an additive superposition of the spikes 
generated by this group of neurons. This spike is called an overlapping spike. 
Consequently, if considered as a clustering problem, the number of clusters 
will not in general be equal to the number of neurons producing the spike 
train. 

For definiteness we shall assume that there are v neurons, labeled 1 to 
v, generating the spike train and that n spikes, say, S±,...,S n , are de- 
tected and recorded in the spike train. Here the SVs are aligned accord- 
ing to their peaks and each Si £ W 1 for some d £ Z + . The d components 
of Si are the measurements of the voltage values of the spike on a regular 
grid of time-points around the peak (or maximum) of the spike. Writing 
Si = (<Si,i, • • • , S^d)' £ we assume that 

(1) Sij = Qij + rjij Vi = l,...,n,j = l,...,d, 

where rjij,i = 1, . . . ,n,j = l,...,d, are i.i.d. noise random variables with 
mean 0, variance a 2 and 0j = (0j,i, . . ., @i,d)' £ i = 1,. . . ,n, are i.i.d. 
random vectors. The 0j's and 7ft j's are assumed to be all independent. Oj 
denotes the denoised spike shape of Si and is random because the denoised 
spike shape is a function of the particular neuron(s) generating it and time 
lag in the spiking times if there are >2 neurons. Let a £ M. d be a constant 
vector such that a' a = 1. In Sections 4 and 5, a is taken to be the first prin- 
cipal component of Si,...,S n and O.Olri vector 0's. However, other choices 
of a are possible too. For each spike Si, define 

(2) Xi = a'Si Vi = l,...,n, 

which is the projection of Si onto a. It follows from (1) that X\, . . . ,X n are 
i.i.d. random variables. As observed by Ventura (2009), either implicitly or 
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explicitly, almost all spike sorting methods assume that Xi has a mixture 
distribution with probability density function of the form 

V 

( 3 ) Y Y ^3u-,k h h,-,h( x ) VxGM, 

9=1 l<jx<—<jq<u 

where 7rj lr ..j is the probability that Si is generated by (and only by) neurons 
ji,...,j q . The hj l! ,,,j q , s are usually assumed to be Gaussian densities [cf. 
Lewicki (1994, 1998)], even though t densities have also been proposed [cf. 
Shoham, Fellows and Normann (2003)]. 

In this article we shall not assume that hj(-), j = 1, . . . , u, are Gaussian 
or t densities but only that hj(-) = f^^i') belongs to a location family of 
probability densities [cf. Lehmann (1983), page 154]. It is noted that this 
location family contains both Gaussian and t densities. Here \Xj and a 2 
denote the mean and variance induced by the density f^. a 2 . 

Due to the complex nature of spike overlap, we do not think it is accurate 
to model each hj l7 ,„j q , 2 < q < v, as a Gaussian or a t density (or, indeed, any 
other parametric density having only a small number of unknown parame- 
ters). For example, hi? depends on noise as well as on the time lag between 
the spiking times of neurons 1 and 2. As the time lag is also random and 
that the phenomenon of neurons spiking in close proximity to each other is 
still not well understood, we feel it is more appropriate to model hi % using 
a nonparametric density (rather than a parametric one). In particular, in 
this article, we shall assume that (3) is of the form 



V 



(4) 



+ Y Y ^h,-,k / f^(x)dG ju ... dq (fi), 



q=21<j 1 <-<j q <v 



where iri > TT2 > • • • > tt u > and Gji v ..j 9 's are unknown absolutely contin- 
uous probability distributions. Consequently, (4) leads to a nonparametric 
location mixture. 

The objective is to estimate v in (4) using the sample X±, . . . ,X n and 
an independent auxiliary sample of i.i.d. observations Yi,...,Y m obtained 
from the silent region of the spike train. The silent region is defined to be 
the sections of the spike train where there are no spikes. Hence, the Yj's are 
defined as 

(5) Yi = (r t l 1 ,...,ril d )a V/ = l,...,m, 

where 77*^ . . . , r? z * rf are noise voltage measurements on a regular grid of d con- 
secutive time-points of the silent region of the spike train. We assume that 
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rifj,rji t k, 1 < I < m,l < i < n,l < j,k < d, are i.i.d. random variables with 
mean and variance a 1 . In this article a method-of- moments estimator v 
for v is proposed. This estimator has a number of attractive properties. First, 
this article establishes a reasonably transparent theory justifying/supporting 
v. In particular, v is a strongly consistent estimator for v under mild condi- 
tions. Second, the estimator can be computed without first (or concurrently) 
computing the other unknown quantities in (4). Consequently, it is computa- 
tionally very fast relative to, say, EM or Markov chain Monte Carlo (MCMC) 
algorithms. 

We would like to add that the above problem of estimating v can be 
regarded as robust and (yet) consistent estimation of the number of com- 
ponents of a finite mixture where the latter is subjected to a small but 
nonnegligible contamination by a nuisance distribution. While quite a num- 
ber of papers have been written on estimating the number of components 
of a finite mixture [cf. Dacunha-Castelle and Gassiat (1997) and references 
cited therein] , we are not aware of any work in the statistics literature that 
deals with this problem when the mixture is contaminated by another dis- 
tribution. 

On the neuroscience literature side, numerous algorithms for spike sorting 
have been proposed. A review of spike sorting algorithms and a discussion 
of their strengths and limitations can be found in Lewicki (1998). In par- 
ticular, a considerable number of spike sorting algorithms assume that the 
proportion of overlapping spikes is negligible relative to the proportion of 
isolated spikes and, hence, the possible presence of overlapping spikes is ig- 
nored. With this assumption, many spike sorting algorithms further assume 
that the number of neurons v is known and the problem reduces to a stan- 
dard classification problem. If v is unknown, other spike sorting algorithms 
use various EM or MCMC methods for determining v as well as for assign- 
ing spikes to the neurons [cf. Pouzat, Mazor and Laurent (2002), Nguyen, 
Frank and Brown (2003), Wood and Black (2008)]. However, Brown, Kass 
and Mitra [(2004), page 456] noted that MCMC techniques have yet to be 
widely tested in spike sorting. 

Spike sorting algorithms that take overlapping spikes into account usu- 
ally involve significant user input [cf. Mokri and Yen (2008)]. Section 5 of 
Lewicki (1998) discusses the use of templates, independent component anal- 
ysis and neural networks to handle overlapping spikes and the limitations 
of these methods. If v is known, there are at least two model-based ap- 
proaches for handling overlapping spikes. The first approach considers a 
(y + l)-component mixture distribution with the first v components mod- 
eling the spike waveforms of the v neurons. The {y + l)th component is a 
uniform density over a suitably large region of the sample space [cf. Sahani 
(1999), page 95] that serves as an approximate model for the overlapping 
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Table 1 
Factorial experimental design 



Experiment 


Noise distribution 


Spike detection algorithm 


Proportion of 
overlapping spikes 


1 


Gaussian 


Oracle 


10% 


2 


Gaussian 


SpikeOMatic 


10% 


3 


Gaussian 


Oracle 





4 


Gaussian 


SpikeOMatic 





5 


Student £5 


Oracle 


10% 


6 


Student t$ 


SpikeOMatic 


10% 


7 


Student ts 


Oracle 





8 


Student is 


SpikeOMatic 






spikes. The second approach is a trimming method where a number of the 
largest observations (outliers) are omitted [cf. Gallegos and Ritter (2005), 
Garcia-Escudero et al. (2008)]. The rationale is that most of the outliers cor- 
respond to overlapping spikes and, hence, the remaining observations should 
be comprised essentially of isolated spikes. 

The rest of the article is organized as follows. Section 2 introduces a 
number of trigonometric moment matrices. Theorem 1 derives some explicit 
error bounds for their eigenvalues. Motivated by these error bounds, Section 
3 proposes a method-of-moments estimator v for v. Theorem 2 shows that 
z> is a strongly consistent estimator for v under mild conditions. A point of 
note is that Theorem 2 does not require the proportion of overlapping spikes 
to be asymptotically negligible as sample size tends to infinity. 

Section 4 presents a detailed simulated spike train study that investigates 
the finite sample accuracy of v. Each spike train is generated by v neurons 
where v varies from 1 to 5. The spike shapes of these neurons are estimated 
from real data. There are 8 experiments in the study which present a variety 
of different spike train situations depending on the proportion of overlapping 
spikes, the spike detection technique and sample sizes n,m. Table 1 sum- 
marizes the 8 experiments as a factorial design. This study finds that v has 
very good accuracy with regard to these 8 experiments for moderately large 
sample sizes such as n = 1000 and m = 2000. As a comparison, we have also 
applied the SpikeOMatic software [cf. Pouzat, Mazor and Laurent (2002); 
Pouzat et al. (2004)] to obtain an alternative estimate v\ for v in 2 of these 
experiments. 

Section 5 considers two spike train data sets taken from Lewicki (1994). 
The first is an actual 40-second spike train recording and the second is a 
synthesized recording using 6 spike shapes estimated from the first data 
set. Figure 1 presents a portion of the actual spike train recording. Lewicki 
(1994) inferred that v is 6 for the actual recording. Three estimators are 
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Fig. 1. A portion of Lewicki's actual spike train recording. 



used to estimate v from each spike train, namely, i>2 by Lewicki's spike sort- 
ing algorithm [cf. Lewicki (1994)], i>\ by SpikeOMatic software [cf. Pouzat, 
Mazor and Laurent (2002)] and v. For the synthesized recording, we obtain 

= ^2 = 5 while v = 4 or 5 depending on whether the threshold of v is 
set to 1.0 or 0.8, respectively. On the other hand, for the actual spike train 
recording, we obtain z>i = 12, i>2 = 9 while v = 4 or 5 depending on whether 
the threshold of v is set to 1.0 or 0.8, respectively. Thus, relative to v\ and 
V2, v is a more stable estimate with respect to these 2 data sets. 

The article ends with some concluding remarks in Section 6. The symbols 
K, C denote the set of real numbers, complex numbers respectively, and all 
proofs are deferred to the Appendix. 

2. Some trigonometric moment matrices. This section constructs a num- 
ber of trigonometric moment matrices and derives explicit error bounds for 
their eigenvalues. Following the notation in (4), let 9 denote a random vari- 
able with cumulative distribution function 

V V 

Fe(fJ-) = J2 n i T {v> H) + Yl j-Sl') 

j=l q=2i<h<-<j q <v 

(6) 

where tt\ > Ti2 > • • • > tt u > and !{■} denotes the indicator function. For 
simplicity, let 7r cont be the proportion of overlapping spikes and 7r con tF cont 
be the continuous component of Fg. Then 



7T, 



cont 



E E 

q=2 l<ji<-<j q <v 



V 

3=1 
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q=2 1<ji<-<j q <u 

Let p > ^ be an integer. Motivated by the ideas of Lindsay (1989a), (1989b) 
on moment matrices and finite mixtures, we define a function T p :M— > 

£{P+l)X{ P +l) ^ 



e~ i: 



T p {x) 



\2.r 



-\2x 



-ipx p —i(p-l)x p -i(p-2)x 



( 1 \ 



-\2x 



ix A2x 



(l,e-,e 



D ipx\ 



e ipx 
e i(p-l)x 
R i(p-2)x 



\ 



1 / 



where i = \J — 1. We further define matrices corresponding to the discrete 
component and the continuous component of Fg by 



/ 1 



-M"p,disc = ^2 'KiTp&i 



i=l 



-iMl 
-i2/ii 



•»PMi 



1 \ 

-i2/i„ 



-ipfJ-v 



/7Ti 
7T 2 

\ o o 



\ 





(7) 



f27T 



M, 



p,cont 



/ T p (fj,)f cont (fj,)d(fx)= / T p (//) V f cont (fi + 2Trj)d(n) 



j=-oo 



where / con t is the probability density function of the distribution F cont . Fi- 
nally, we define 



-^p,disc ^"cont-^p,cont ■ 



Let Aj(yl) denote the ith largest eigenvalue of A where A is an arbitrary 
(p + 1) x (p + 1) Hermitian matrix. Hence, Ai(A) > A 2 (A) > ■ ■ > A p+ i(A). 



8 



M. LI AND W.-L. LOH 



Theorem 1. With the above notation, suppose that fj,±, . . . , fx v are all 
distinct, < fj,±, . . . , \i v < 2ir. Then for i = 1, . . . , v, we have 

(p + l)7Tj + 27T7T con t < mill / CO nt(^ + 27Tj) > 

0<fi<2w * — * 
I j=—oo ) 



(8) 



2 Yl 7r i 7r * 

l<j<k<v 



1 _ gi(p+l)(Mi-Mfc) 



1 _ gi(Mj-Mfc) 



< (p + l)7Ti + 27T7r cont < max V] /contG" + 27rj) 

0</X<27T * » 

^ j=-oo 



+ 



2 X] 7r J' 7Ffc 

l<j<k<v 

Also, for i = v + 1, . . . ,p + 1, we have 



I — e i(p+l)(Mj-Mfc) 



1 _ e i(A*j-Mfc) 



(9) 



2vr7r cont min V] /cont(/i + 27rj) 

0<fi<2w * — * 
j=-oo 



< Aj(Mp) < 27T7r cont max /contG" + 27rj). 

jf=-oo 



The following is an immediate corollary of Theorem 1. 

Corollary 1. Suppose the conditions of Theorem 1 are satisfied. Then 
for any constant 7 > 0, there exists a positive integer p 7 such that 



Xu(M p ) > -yy/p + 1 > A„ + i(M p ) Vp>p 7 . 

Also, \ u +\(Mp) is bounded uniformly in p. Finally, Aj(M p ) ~ (p + l)7Tj, VI < 
i < u, as p — > 00. 

Corollary 1 gives, at least in principle, a way for estimating u by estimat- 
ing the eigenvalues of M p . 

3. A method-of-moments estimator for v. Let rjij, i = 1, . . . , n, j = 1, . . . ,d, 

and 0j, i = 1, ... ,n, be as in (1) and Xi = a' Si, i = 1, . . . ,n, be as in (2). 
Then X\,... ,X n is an i.i.d. sequence of observations from the mixture dis- 
tribution given by (4). We observe that 



Xi = 0* + Yt 



Vi = 1, . . . , n, 
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where 9i = a'®i and Y{ = (r)i t i, . . . , 'qi^'a are independent random variables 
having cumulative distribution function Fg, given by (6), and probability 
density function /o j(T 2, respectively. B{ can be regarded as the signal and Yi 
the zero-mean noise. Here we assume that E(e~ lkYl ) ^ for all k G {1, . . . ,p}. 
This is a very weak assumption and is satisfied by, for example, mean- 
centered normal, t and Gamma distributions. Because 9\ and Y\ are inde- 
pendent, we have 

E{e- ikdl ) = E{e~ ikXl )[E(e- ik?1 )]~\ 

Let Yi, . . . , Y m be as in (5). We observe that the Yj's are obtained from the 
voltage measurements at different time-points of the silent region of the spike 
train. Then Y%, . .. ,Y m are i.i.d. with density f 0a 2 and are also independent 
of X\ , . . . , X n . As Y\ has the same distribution as Yi , we have 

(10) E(e-' lkYl )^0 Vfc€{l,...,p}. 

Since M p = E[T P (6)], we shall estimate M p using its sample analog, that 
is, the (p + 1) x (p + 1) matrix M p whose (J, k)th element is given by 



(11) (M p )j„ 



n 



1 £2=i e-^'-*^ 



m" 1 YT=i e-'^-V Y ' 



M p is a Hermitian matrix and, hence, its eigenvalues are real numbers. The 
method-of-moments estimator u for u is as follows. Let 7, p 7 and p>p^ be 
as in Corollary 1. Define 

(12) i> = #{i : 1 < i <p + 1, Xi(M p ) > Threshold}, 



where 7threshoid = 7\/p + 1 an d #{'} denotes set cardinality. We call 7threshoid 
the threshold parameter of v. 

Theorem 2. Let v be as in (12) with p>p-y. Suppose (10) and the 
conditions of Theorem 1 hold. Then v — > v almost surely as min(m,n) — > 00. 

For v to perform well, it is necessary to obtain a good choice of the thresh- 
old parameter 7threshold- The usual methods, such as cross-validation, for 
determining the value of the threshold parameter do not seem to work here. 
Instead we shall compute explicit error bounds for \{M P ), i = 1, . . . ,p + 1, 
below. These error bounds shall serve as guidelines for setting the value of 

Tthrcshold • 



Lemma 1. Let e > be a constant, 

>e\ Vj = l,...,p, 
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and £l £ = U?=i Then 



p ( n.)<|: m in{ m2[e|E( % yi)|14 (i + (l)), 



42 'l+ofi 



m 3 [e|E(e-^ y i)|] 6 V V 7T1 



It is interesting to note that for the parameter values and sample sizes 
that we are concerned with in this article, the inequality of Lemma 1 gives 
a smaller upper bound than those obtained via the Hoeffding or Bernstein 
exponential-type inequalities. 



Theorem 3. Let Q £ be as in Lemma 1, Q £ be its complement and E * 
denote the conditional expectation given Then with the assumptions of 
Theorem 1, 



p+1 



(13) 



< 



p-j + 1 



+ 



pe A 



^n(l-eyf^(p + m z (o-j)\2 (l-e)2' 
where Z = Y\ja , t/Jz{t) = Ee~ ltz for alltG'R and, hence, Ee~^ Yl = 7pz(o~j). 



We remark that the upper bounds of the inequalities of Lemma 1 and 
Theorem 3, though relatively simple, are conservative in that the quantities 
on the left-hand side are substantially smaller than those on the right-hand 
side. Nonetheless, we shall end this section with an example which computes 
the upper bounds in Lemma 1 and Theorem 3 explicitly. Suppose e = 0.05, 
p = 20, a = 0.1, Yi ~ iV(0,o- 2 ), n = 1000 and m = n 2 . Then P(Sl e ) < 0.01 
and 

p+i 



4. Simulated spike train study. In this section we shall study the finite 
sample performance of the method-of-moments estimator v given by (12) 
via simulated spike trains. 



MULTI-NEURON AL SPIKE TRAIN 



11 



4.1. Spike train generation. For realism, we use spike shapes estimated 
from real data to generate the spike train. The spike shapes are obtained by ap- 
plying Pouzat's software SpikeOMatic which is available at www.biomedicale. 
univ-paris5.fr/SpikeOMatic.html to a tetrode data set recorded from the 
locust {Schistocerca americana) antennal lobe. This tetrode data is dis- 
tributed with the SpikeOMatic software at www.biomedicale.univ-paris5.fr/ 
SpikeOMatic/Data.html. It is a 20-second recording sampled at 15 kHz from 
a tetrode filtered between 300 Hz and 5 kHz. Pouzat, Mazor and Laurent 
(2002) and Pouzat et al. (2004) are two papers behind the software SpikeO- 
Matic. 

SpikeOMatic is applied to the locust data and we selected the 6 spike 
shapes with the largest numbers of spikes. These (four-channel) spike shapes 
are shown in Figure 2. As the two spike shapes with the smallest first channel 
positive peak heights have significant difference only in the fourth channel, 
we will delete one of them in order to generate single channel data in which 
distinct spike shapes are different enough to be distinguished. The resulting 
5 spike shapes are shown in Figure 3 with the noise standard deviation a = 1. 

Once the spike shapes are obtained, a Poisson process is simulated to 
select the time of spike events. Given that a spike occurred, a random vari- 
able with categorical distribution is generated indicating which neuron has 
spiked. The categorical distribution is assumed to have equal probabilities 
for each of v possible spike shapes where v is the number of neurons (i.e., 
n% = ■ ■ ■ = ir u < \jv). Here v ranges from 1 to 5. The n^s can be set to differ- 




o 



r>0 



100 



ir,o 



time 



Fig. 2. Six four- channel spike shapes from the locust data. 
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time 



Fig. 3. Five one-channel spike shapes selected with a = 1. 

ent values but they should not be too close to zero. This is necessary because 
if one of the neurons has infinitesimally small probability of spiking, then it 
would appear that no algorithm can estimate v well with a finite sample. 

As noted previously, Pouzat's locust data is a 20-second recording sam- 
pled at 15 kHz. Thus, the recording has 300,000 time-points on a regular 
grid. There are about 1000 spikes detected. Hence, the collective firing rate 
is about 1000/300,000 = 1/300. In our simulations, we choose a similar col- 
lective firing rate of 1/400, as this gives about 10% overlapping spikes. 

SpikeOMatic rescales the data such that the noise standard deviation 
o" = l. In the generation of a spike train, independent standard Gaussian 
noise, or t$ distributed noise (with degree of freedom 5) multiplied by a/3/5, 
is added to the "signal." Multiplication by a/3/5 is needed since t$ distribu- 
tion does not have standard deviation 1. Here "signal" refers to either the 
spike shape or (in the case of pure noise) . 

4.2. Spike detection. As mentioned in the Introduction, one of the tasks 
of spike sorting is to identify each spike in the spike train. This is known as 
spike detection. The function find. spikes.with. template provided by SpikeO- 
Matic is used to detect spikes from spike train recordings. In our study, the 
detection threshold is set to 2.25cr (which is one of the recommended values 
of SpikeOMatic) where a is the noise standard deviation. This is because 
the spike shape with the smallest positive peak height (in Figure 3) is close 
to 3<T and we do not want to consistently miss detecting spikes with this 



MULTI-NEURON AL SPIKE TRAIN 



13 



spike shape. The function find. spikes. with. template uses template to detect 
spikes and, hence, there should not be too many false positives with de- 
tection threshold = 2.25a if the noise is Gaussian. However, this threshold 
would present problems if the noise has a heavy-tailed distribution such as 
the £5 distribution (cf. Experiment 6 of Section 4.4). For more details of 
spike detection via SpikeOMatic, we refer the reader to the documentation 
for the software. 

4.3. Setting the tuning parameters of v. We observe from the definition 
of v in (12) that v is not scale invariant and that v has three tuning pa- 
rameters, namely, a, p and 7threshoid> to be determined in order to use v. 
First we observe that it is usually the case that the silent region forms a 
sizeable part of a spike train. Since data from the silent region can be used 
to estimate a, we shall, without loss of generality, assume in this subsection 
that a is known. 

The function make. sweeps of SpikeOMatic extracts the spikes and the 
pure noise observations and automatically rescales them such that the noise 
standard deviation is 1. After extracting the spikes, we shall further rescale 
them such that the noise standard deviation a = 0.1. The rationale for this 
will be explained later. Next we set 7thrcshoid = 1 with the implicit assump- 
tion that (p + 1)tt u > 7threshoid- We observe from (8) that the latter is a rough 
proxy for \ u (M p ) > 7thrcshoid- This implicit assumption is indeed satisfied for 
all the 8 experiments. (If this does not hold, we would have to set a smaller 
value for 7threshoid with a corresponding increase in false positives.) 

Let Si, . . . , S n G R denote the extracted spikes. The (normalized) first 
principal component aGl of Si,..., S n , 0, . . . , is computed. Here 0£l d 
and the number of 0's used to compute a is taken to be O.Oln (to the nearest 
integer). The 0's are needed in the computation of a because if not, a is 
not well defined in the case of v = 1 and i.i.d. Gaussian noise. Next define 
Xi = a' Si as in the Introduction. The rationale for projecting the spikes 
onto the first principal component is the hope that this direction will best 
separate the spike shapes from one another as well as from pure noise (which 
is represented by 0's). 

Now we argue that a should be neither too small nor too large after 
rescaling. If a is large, then ifjz(o~j) wm be small. [For example, if the noise 
is Gaussian, we have ipz(o~j) = e - - 7 a Z 2 .] Thus, the bound as on the right- 
hand side of (13) is large. This indicates that the estimation error of the 
eigenvalues will be large, resulting in poor performance of v. 

On the other hand, if a is scaled too small, the distance between some 
pair of /Xj's would likely be small, that is, for some j 7^ k, |1 — e 1 ^* - ^^ 
will be small. This implies that the lower bound in (8) can be less than the 
threshold parameter 7threshold = 1- This again results in poor performance 
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of v. Consequently, we would like the following condition to be satisfied: 



Since m is large, we observe that |i?(e _ip *' 1 )| ~ |m _1 Ylb=i e~ ipY '\- Conse- 
quently, the left-hand side of the inequality in Condition (I) is an approxi- 
mation for the right-hand side of (13) with e = 0.05 and can be used as an 
approximate upper bound for the standard error of the eigenvalue estimates 
in (13). Thus, if Condition (I) holds, 7threshoid = 1 is very likely to exceed 3 
times this standard error and that, for sufficiently small 7r cont , \ u +i(M p ) < 1. 
Finally we choose p = p max where p max is the largest value of p satisfying 
Condition (I). Numerical experiments on the accuracy of v with these values 

Of (7 = 0.1, 

P — Pmax; and '/threshold — 1 will be reported in Section 4.4. 

4.4. Numerical experiments. In this subsection we shall study the per- 
formance of the estimator u of Section 4.3 via 8 simulated spike train ex- 
periments. Each experiment is divided into 5 scenarios depending on the 
number of neurons generating the spike train (i.e., v = 1, . . . , 5). A total of 
at most 5 neurons are considered. The spike shapes of these neurons are 
given in Figure 3. As there are many ways of selecting v neurons from 5 
neurons if v < 5, we shall choose the v neurons in a "least favorable" man- 
ner for estimating v. if v = 1, we take the neuron with the smallest peak 
height in Figure 3 to be the one generating the spike train; if v = 2, then we 
take the 2 neurons with the two smallest peak heights in Figure 3 to be the 
ones generating the spike train; and so on until we reach v = 5. In all the 
experiments the estimator v is used after rescaling the data so that a = 0.1 
and setting p = p max and 7throshoid = 1 • The spike shape vectors each have 
d = 45 components. Table 1 summarizes the 8 experiments as a factorial 
design. 

Experiment 1. In this experiment the noise is i.i.d. Gaussian with 
mean and variance a 1 = 1. We assume that the spikes are detected from 
the spike train without error (or, equivalently, there exists an oracle spike 
detector): 

• The proportion of overlapping spikes 7r con t « 0.1 (or 10%). 

• n = 1000 (or 500), that is, there are 1000 (or 500) spikes detected from 
the spike train. 

• m = 2n, where m is the number of pure noise observations Yj's [as in (5)] 
obtained from the silent region of the spike train. 

• The approximate /ij's (i.e., the projections of the spike shape vector onto 
the first principal component a € M. d ) of these neurons are presented in 



Condition (I): 




Table 2. 
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Table 2 

Approximate values of /iti, . . . , for Experiments 1, 3, 5 and 7 













Mi 


M2 


Ms 


H4 


Ms 


Experiment 


1 


V 


= 


1 


11.1 














V 


= 


2 


9.7 


13.7 












V 


= 


3 


8.4 


11.3 


15.8 










V 




4 


5.7 


9.5 


12.3 


20.4 








V 




5 


11.4 


14.3 


16.9 


19.5 


58.9 


Experiment 


3 


V 




1 


11.7 














V 




2 


8.1 


12.4 












V 




3 


9.2 


12.2 


16.6 










V 




4 


5.5 


9.3 


12.0 


20.2 








V 




5 


11.4 


14.3 


17.0 


19.5 


58.9 


Experiment 


5 


V 




1 


11.2 














V 




2 


10.3 


14.0 












V 




3 


9.9 


13.0 


17.2 










V 




4 


5.5 


9.2 


12.0 


20.1 








V 




5 


11.4 


14.3 


16.9 


19.4 


58.9 


Experiment 


7 


7/ 




1 


11.6 














V 




2 


8.5 


12.7 












V 




3 


9.1 


12.2 


16.5 










V 




4 


5.5 


9.2 


12.0 


20.2 








V 




5 


11.4 


14.3 


17.0 


19.6 


58.9 



Table 3 gives the percentage of the time (out of 100 repetitions) that 
v = v, the true number of neurons producing the spike train. In the case 
n = 1000, m = 2000, u does well, for example, for u = 2, the percentage of 
the time v = v is 97%. In the case n = 500 and m = 1000, v does well for 
the scenarios 1 < v < 4 but does poorly for v = 5. 

Remark. The Editor raised the following question: "would detection 
improve if m/n were larger than 2/1?" The nature in which the sample 
sizes m, n affect the accuracy of v is not a straightforward one. If n is held 
fixed and m is allowed to increase, then the accuracy of v should improve. 
However, if n is also allowed to increase, then p m3jX would likewise increase. 
This implies from (13) and Condition (I) that V'z( cr Pmax) will be harder to 
estimate because the latter is further out in the tail of the characteristic 
function. Thus, in order to estimate V'z(cPmax) accurately, a much larger 
sample of Yi, . . . ,Y m is needed. Thus, doubling both sample sizes m and n 
may not necessarily increase the accuracy of v. However, for sufficiently large 
m (depending on the value of n), the resulting v will improve in accuracy. 
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Table 3 

Frequency (%) of ' v = v with standard error in parentheses for Experiments 1-8 





11 


Til 


V 


1 

= 1 






V = 4 


V - 


E 

= o 


Experiment 1 


1000 


2000 


81 


(3.9) 


97 (1.7) 


100 (0.0) 


100 (0.0) 


100 


(0.0) 




500 


1000 


97 


(1.7) 


100 (0.0) 


100 (0.0) 


100 (0.0) 


10 


(3.0) 


Experiment 2 


1000 


2000 


51 


(5.0) 


79 (4.1) 


98 (1.4) 


100 (0.0) 


100 


(0.0) 




500 


1000 


65 


(4.8) 


99 (1.0) 


100 (0.0) 


100 (0.0) 


22 


(4.1) 


Experiment 3 


1000 


2000 


89 


(3.1) 


98 (1.4) 


100 (0.0) 


100 (0.0) 


100 


(0.0) 




500 


1000 


91 


(2.9) 


100 (0.0) 


100 (0.0) 


100 (0.0) 


5 


(2.2) 


Experiment 4 


1000 


2000 


67 


(4.7) 


98 (1.4) 


100 (0.0) 


100 (0.0) 


100 


(0.0) 




500 


1000 


59 


(4.9) 


100 (0.0) 


100 (0.0) 


100 (0.0) 


21 


(4.1) 


Experiment 5 


1000 


2000 


82 


(3.8) 


97 (1.7) 


100 (0.0) 


100 (0.0) 


100 


(0.0) 




500 


1000 


94 


(2.4) 


99 (1.0) 


100 (0.0) 


100 (0.0) 


6 


(2.4) 


Experiment 6 


1000 


2000 





(0.0) 


(0.0) 


99 (1.0) 


100 (0.0) 


100 


(0.0) 


Experiment 7 


1000 


2000 


88 


(3.2) 


100 (0.0) 


100 (0.0) 


100 (0.0) 


100 


(0.0) 




500 


1000 


88 


(3.2) 


100 (0.0) 


100 (0.0) 


100 (0.0) 


8 


(2.7) 


Experiment 8 


1000 


2000 


(0.0) 


(0.0) 


100 (0.0) 


100 (0.0) 


100 (0.0) 



Experiment 2. This experiment is identical to Experiment 1 except 
that, more realistically, the oracle spike detector is not used. Instead, as 
described in Section 4.2, we use the function find. spikes. with. template (with 
spike detection threshold 2.25<r) provided by SpikeOMatic to detect the 
spikes of the spike train. Table 3 gives a summary of the percentage of the 
time that v = u, the true number of neurons producing the spike train. We 
observe that for moderately large sample sizes n = 1000 and m = 2000, the 
accuracy of v is still reasonable though not as high as in Experiment 1. This 
is due to the fact that spike detection is now not error free. We note that 
the scenario v = 1 with 7r con t ~ 0.1 is unlikely to occur in practice due to the 
refractory period of a neuron which prevents the occurrence of overlapping 
spikes. 

As a comparison, we shall now compute the SpikeOMatic estimate v\ 
of v. Briefly, the SpikeOMatic software uses an EM algorithm to compute 
v\ based on a penalized likelihood function. The likelihood is a finite mix- 
ture of multivariate normal distributions and the penalty is derived from 
the Bayesian information criterion (BIC). This implies the assumption of 
Gaussian noise and no overlapping spikes. Another point of note is that 
the SpikeOMatic software assumes at least 2 neurons generating the spike 
train and, hence, v\ is always >2. There are 2 major tuning parameters 
for the SpikeOMatic software. We take these to be nb. samples. per. site = 3 
and tolerance. factor = 3.5. The other parameters are set as in the software 
tutorial 1 distributed with SpikeOMatic. The performance of v\ appears to 
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Table 4 

Frequency (%) of v\ for Experiments 2 and 3 with n = 1000 and 

m = 2000 







1 = 1 


i>x = 2 


£>i = 3 


/>i = 4 


Oi = 


Experiment 2 


v = l 





100 













v = 2 





100 













v = 3 





100 













u = 4 





20 


80 










u = 5 








80 


20 





Experiment 3 


v = l 





100 













v = 2 





100 













v = Z 





70 


30 










v = 4 








100 










v = 5 








100 









be rather robust to the choice of the 2 tuning parameters. Table 4 gives the 
frequency (%) of v\ for 10 repetitions of the spike train. In particular, when 
v = 2, the frequency oii>\ = v is 100% and when v = 3, 4 or 5, the frequency 
of v\ = v is 0%. 

Experiment 3. In this experiment the noise is i.i.d. iV(0, 1). We assume 
that the spikes are detected from the spike train without error: 

• The proportion of overlapping spikes 7r cont = or, equivalently, that there 
are no overlapping spikes. 

• n = 1000 (or 500) and m = 2n. 

• The approximate /Vs of these neurons are presented in Table 2. 

Table 3 gives the percentage of the time (out of 100 repetitions) that 
v = v. 

As a comparison, we shall now compute the SpikeOMatic estimate v\ 
for v. We take the major 2 tuning parameters to be nb. samples. per. site = 3 
and tolerance. factor = 3.5. The other parameters are set as in the software 
tutorial 1 distributed with SpikeOMatic. Table 4 gives the frequency (%) 
of v\ for 10 repetitions of the spike train. In particular, when v = 2, the 
frequency of v\ = v is 100%, when v = 3, the frequency of v\ = v is 30% and 
when v = 4 or 5, the frequency of v\ = v is 0%. 

Experiment 4. This experiment is identical to Experiment 3 except 
that we use the function find. spikes. with. template (with spike detection 
threshold 2.25<t) provided by SpikeOMatic to detect the spikes of the spike 
train. Table 3 gives the percentage of the time (out of 100 repetitions) that 
v = v. 
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Experiment 5. In this experiment the noise has the £5 distribution 
multiplied by \/3/5 (so as to have mean and variance 1). We assume that 
the spikes are detected from the spike train without error: 

• The proportion of overlapping spikes 7r con t w0.1 (or 10%). 

• n = 1000 (or 500) and m = 2n. 

• The approximate /Vs (i.e., the projections of the spike shape vector onto 
the first principal component a G W 1 ) of these neurons are presented in 
Table 2. 

Table 3 gives the percentage of the time (out of 100 repetitions) that 
v = v. 

Experiment 6. This experiment is identical to Experiment 5 except 
that we use the function find. spikes. with. template (with spike detection 
threshold 2.25a) provided by SpikeOMatic to detect the spikes of the spike 
train. Table 3 gives the percentage of the time (out of 100 repetitions) that 
v = v. Here the accuracy of v is poor for v = 1 or 2. This is almost certainly 
due to the heavy-tailed £5 distribution presenting severe difficulties to the 
SpikeOMatic spike detection algorithm with threshold set to 2.25o\ 

Experiment 7. In this experiment the noise has the £5 distribution 
multiplied by a/3/5 (so as to have mean and variance 1). We assume that 
the spikes are detected from the spike train without error: 

• The proportion of overlapping spikes 7r con t = 0.0, that is, there are no 
overlapping spikes. 

• n = 1000 (or 500) and m = 2n. 

• The approximate /Vs of these neurons are presented in Table 2. 

Table 3 gives the percentage of the time (out of 100 repetitions) that 
v = v. 

Experiment 8. This experiment is identical to Experiment 7 except 
that we use the function find.spikes.with.template (with spike detection 
threshold 2.25<r) provided by SpikeOMatic to detect the spikes of the spike 
train. Table 3 gives the percentage of the time (out of 100 repetitions) that 
v = v. As in Experiment 6, the accuracy of v here is poor for v = 1 or 2 and 
is due to the heavy-tailed £5 distribution presenting severe difficulties to the 
SpikeOMatic spike detection algorithm with threshold set to 2.25o\ 
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4.5. Study summary. The 8 experiments indicate that for moderately 
large sample sizes such as n = 1000 and m = 2000, v is capable of very 
good performance if the noise is i.i.d. Gaussian. If the noise has a heavy- 
tailed distribution such as the t$ distribution, v is still capable of very good 
performance if an oracle (error free) spike detector is used. One reason for 
the good performance of v for moderately large samples is that in all the 
experiments the first principal component separates the spike shapes from 
one another as well as from pure noise very well. 

However, for smaller sample sizes such as n = 500 and m = 1000, per- 
forms reasonably well for 1 < v < 4 if the noise is i.i.d. Gaussian but performs 
poorly for v = 5. This implies that the sample size n = 500 is too small for 
v to handle the latter case well. 

5. A real data example. This section considers two spike train data sets. 
Both of these data sets are taken from Lewicki (1994). One data set is an 
actual 40-second spike train recording. The other data set is a synthesized 
recording using 6 spike shapes estimated from the former data set. For each 
of the 2 data sets, we shall compute and compare the estimates of the number 
of neurons generating the spike train by applying Lewicki's software SUN 
Solaris OS version 1.1.8 [cf. Lewicki (1994)], SpikeOMatic version 0.6-1 [cf. 
Pouzat, Mazor and Laurent (2002)] and our proposed method-of-moments 
estimator v. For simplicity, let v\ denote the estimate of v given by SpikeO- 
Matic and i>2 be corresponding to the estimate given by the Lewicki software. 

5.1. Synthesized data set. The spike shapes, the number of spikes from 
each neuron and the number of overlapping spikes for the synthesized record- 
ing can be found in Section 7 of Lewicki (1994). The true number of neurons 
v for the synthesized recording is 6. However, all three methods underesti- 
mate v. In particular, v = 4 while v\ = i>2 = 5. A likely reason is that the two 
smallest spike shapes are too close to each other for them to be identified as 
two distinct spike shapes (and not one) by the 3 estimators. A graph of the 
6 spike shapes is given in Figure 7 of Lewicki (1994). 

With reference to u, we have set the tuning parameters to 7threshoid = 1, 
a = 0.1 and p = p m ax as in Section 4. p max turns out to be 18. SpikeOMatic 
is used for spike detection and n = 746 spikes are detected. Table 5 lists 
the eigenvalues X\(M p ) > • • • > Aig(M p ) with X^(M P ) = 0.96 < 7thrcshoid = 
1. Thus, as it stands, our estimate v misses returning the value 5 by a 
very narrow margin. Also, we observe from Lewicki (1994) that 7r„_i + tt u < 
32/895 0.036, which is rather small. In fact, (p max + 1)0.036 0.68, which 
is way below 7threshoid- Thus, if we suspect that ir u is this small, we should 
lower the value of 7threshoid to below 1. For example, if 7threshoid is set to be 
0.8 say, we shall obtain u = 5 for the synthesized data set. However, reducing 
the value of 7thrcshoid would, of course, increase the chance of false positives 
too. 
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Table 5 

The eigenvalues in decreasing order computed from our algorithm for the two data sets 



Synthesized recording 


7.86 


4.73 


4.52 


2.02 


0.96 


0.25 


0.10 


0.05 


0.03 


0.01 




0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


-0.02 


-0.09 


-1.42 




Actual recording 


6.64 


3.83 


1.59 


1.17 


0.86 


0.67 


0.48 


0.33 


0.20 


0.14 




0.11 


0.07 


0.05 


0.01 


-0.04 


-0.11 











5.2. Actual spike train data. When the 3 methods are applied to the 
actual spike train data set, we obtain v = 4 while v\ = 12 and vi = 9. Lewicki 
[(1994), page 1020] inferred that the number of neurons u generating the 
actual spike train recording is still 6. Thus, it would appear that v\ and 
i>2 have overestimated v and that they give rather different values of v for 
the synthesized and actual data sets. On the other hand, v, with the tuning 
parameters as in Section 4, gives v = 4 as in the synthesized data set. This 
shows that v is rather stable and is probably less variable than v\ or U2- As 
in the discussion of Section 5.1, if we suspect that ir u is very small such that 
(Pmax + l)7r„ < 1, we should lower the value of 7threshoid m order to detect 
this spike shape. If we reduce the value of 7threshoid to 0.8, we obtain u = 5 
(from the eigenvalues of Table 5). 

We conclude this section by presenting the parameter settings for the 
three methods. The parameters for Lewicki's software are set as his default 
values for the analysis of the synthesized recording. Here a spike is defined 
as the window of measurements that are within 1.0 millisecond prior to the 
occurrence of the peak and 4.0 milliseconds after the occurrence of the peak. 
Since the sampling frequency is 20 kHz, there are 20 measurements before 
the peak and 80 measurements after the peak. 

The parameters for SpikeOMatic software are given in Table 6. For the 
detailed explanation of the meaning of the parameters, we refer the reader to 
the SpikeOMatic manual which comes along with the software. We further 
note that the spike length is chosen to be d = 100, the same as for Lewicki's 
software, and n = 1447 spikes are detected with m = 2000. 

With respect to v, the spike detection procedure of SpikeOMatic is used. 
This part of the parameter setting is the same as in Table 6. Here d = 100, 
spike detection threshold = 3.00cr and the other tuning parameters for i> are 
the same as in Section 4. 

6. Concluding remarks. In conclusion, this article proposes a new esti- 
mator v for estimating the number of neurons v in a multi-neuronal spike 
train, v has a number of advantages over alternative estimators for v in 
the existing literature. First, it is a method-of-moments estimator and uses 
trigonometric moment matrices in its construction (unlike maximum likeli- 
hood estimators). As a result, the assumptions needed for v are minimal. 
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Table 6 



Parameter settings for 
SpikeOMatic 



Parameter 



Value 



template. length 
threshold 
sweep. length 
peak. location 
nb. noise, evt 
nb . samples . p er . sit e 
tolerance . factor 
nb. clusters. min 
nb. cluster. max 
nb. iterations 
nb. tries 



100 
20 
2000 
6 



160 



12 
25 
20 



3.0 



3.5 
2 



Indeed, the model (4) on which it is based is a nonparametric mixture distri- 
bution. (4) takes explicitly into account the possibility of overlapping spikes 
while no parametric assumptions are made on the noise distribution or the 
overlapping spike distribution. 

Second, we have managed to develop a rigorous nonasymptotic theory 
in support of v. This theory is reasonably simple and transparent. In par- 
ticular, it shows that v is a strongly consistent estimator of v under mild 
conditions. Also, perhaps more importantly, the nonasymptotic error bounds 
of Theorems 1 and 3 provide us with a way of setting the tuning parameters 
7threshold; V an d & so as to ensure that v performs well in practice. The latter 
is further justified by applying v to a number of spike train simulations in 
Section 4 and to an actual spike train data set in Section 5. 

Finally, we have assumed independent noise (i.e., the rji^'s and rf^s of 
Section 1) throughout this article. If the noise is a stationary and weakly 
dependent process, Theorem 2 still holds (i.e., v is strongly consistent) as 
long as 




i=l 



(14) 




m 



VI < j, k < p + 1 



l=i 



almost surely as min(m,n) — > oo. We observe that (14) is a rather mild 
condition and is satisfied by many weakly dependent processes. 
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Proof of Theorem 1. First suppose that i e {1, . . . , u}. We observe 
from Lemma 3 below that Aj(M Pi disc) = \(B). Let = diag((p+ l)7ri, • • • , (p- 
l)ir u ). Using Theorem A. 37 of Bai and Silverstein (2009), we have 



t |2 



i=l 



i=l k=l 



2 Yl 7r J 7r * 
i<j<fe<y 



1 _ gi(p+l)Oj-Wt) 



1 _ e i(^-/Xfc) 



Thus, 



|Ai(M Pidisc ) - (p+l)7Ti| < 



1 _ e i(p+l)(Mi-Mfc) 



1 _ gi(Mj-Mfe) 



2 E 7r i 7r * 

l<j'<fc<j/ 

Since M p = M p ^\ sc + 7r con t Mp )Cont , we observe from Corollary 4.9 of Stewart 
and Sun (1990) and Lemma 4 that 

Ai(Mp) > Aj(M P) disc) + 7TcontAp+l(Mp iCont ) 

> (p + + 27T7r cont < min V] / cont (/x + 27rj) > 

0<fi<2w L — ' 
I. j=—oo ) 



\ 



2 E W 

l<j<fc<l/ 



1 _ e iO+l)Oj-Mfc) 



1 _ e i(Mj-lik) 

Ai(Mp) < Ai(M Pjdisc ) + 7r con tAi(Mp )Con t) 



< (p + l)7Tj + 27T7r cont < max V] / cont (/i + 27rj) 

0<fi<2w 1 — ' 
^ j=-oo 



+ 



\ 



2 

l<j<k<v 



1 _ e i(p+l)(Mj-Mfc) 



1 _ e i(Mj-Mfc) 



This proves the first statement of Theorem 1. Next, suppose that i £ + 
1, . . . ,p + 1}. Then Aj(M p c iisc) = 0. Using Corollary 4.9 of Stewart and Sun 
(1990) and Lemma 4 again, we obtain the second statement of Theorem 1. 

□ 

Lemma 2. Let m> n be positive integers and A be a mx n matrix with 
complex-valued entries. Then the eigenvalues of AA* are the eigenvalues of 
A* A and (m — n) zeros where A* denotes the conjugate transpose of A. 
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Proof. We observe from the singular value decomposition of A that 
A = UDV* where U is a m x m unitary matrix, D a m x n diagonal matrix 
with nonnegative real numbers on the diagonal and V a n x n unitary matrix. 
Then A* A = VD*DV*, and AA* = UDD*U*. Lemma 2 follows since D*D 
and DD* are both diagonal matrices. □ 

Lemma 3. Let M p ^ sc be as in (7). With the notation and assumptions 
of Theorem 1, we have 



Xi(M P)disc ) = Xi(B) Vi = l,..., 
Ai(Aip,disc) = Vi = i/ + 1, 

where B is a v x v Hermitian matrix defined by 



( 



B 



(p + 



1 



i(p+l)(/ii-/i 2 ) 



\/ 7r l 7r 2- 



3 i(p+l)(/i 2 -Mi) 
- e i(M2-Mi) 



1 _ gi(p+l)0^-Mi) 



1 _ e i(/ii-M2) 
(p + 1)7T 2 

1 _ e i(P+l)(^-^2) 

1 _ e i(M^-M2) 

1 _ e i(P+ 1 )(Mi-^) \ 

1 _ gi(/ii-/i„) 
1 _ e HP+ 1 )(M2-M I /) 



Proof. Let 

/ 1 

e -i2/ii 



1 \ 

-i2/jj, 



/v^i o 



\ o 



\ 





-J 



Then M Pj disc = SE* and ■ 



Lemma 3 follows from Lemma 2. □ 



Lemma 4. Lei M PjCont 6e as m ^7j. With the notation and assumptions 
of Theorem 1, we have 



2ir min V] /cont^ + 27rj) 

j=-oo 
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< A p +i(M P|COn t) < Ai(M PjCOIlt ) 

oo 

<2ir max } f cont {v + 

0<fi<2iT * — ' 
j=-oo 

Proof. Let a = (oi, . . . ,a p+1 )' £ C p+1 . Then 

a^conta _ E£ 1 iESi 1 [/o 2 "E£-oo/cont(^ + 2vrOe i ^- J )/-^]a fc a; 
a*a a*a 

_ It I EjS gfc^f Eg-oo /cont(^ + 27TZ ) <fr 
_ 2^ g | gfc^f E/=-oo W/* + 2ttQ <fr 

Thus, for an arbitrary a E C p+1 such that a*a= 1, 

oo 

27r min V] /cont(^ + 27rj) 

0</^<27T ' 

j=-oo 

< a*M PiCont a 

oo 

< 2ir max V" / con t(^ + 27rj). 

0<A t <27r * — » 

j = -OD 

Since Ai(M PjCon t) = sup aeC p+i )0 » a=1 o*M P)Cont a and A p+ i(Afp iCO nt) = 
inf ogC |p+i a * a =i a*M PjCOn ta, Lemma 4 is proved. □ 

Proof of Theorem 2. We observe from the strong law of large num- 
bers that M p — > M p almost surely as min(m,n) —> oo. This implies that 
X u+ i(M p ) < jy/p + 1 < X u {Mp) almost surely as min(m,n) — > oo. □ 

Proof of Lemma 1. Using Markov's inequality, we observe for t = 
1, . . . , p that 



P(Cli. e ) = p( ±Yte-™* - Ee-W) 
\ j=i 



>e\E(e 



-UYx • 



E[\m- 1 V™ (e- l£Y i - Ee~^)\ k ] 
[e\E(e- liY ^)\] k 



Taking k = 3, 4, we obtain 
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m 3 [e\E(e- iiY i)\] 6 



l + 0[ - 

m 



□ 



PROOF of Theorem 3. Using Theorem A. 37 of Bai and Silverstein 
(2009), we have 



p+i 



(15) 



i=l 



p+i p+i 

EE 

j=i k=i 

v 

2]T(p-j + l)£ : 



<E QC J2J2\(M P hk- (M p ) jtk \ 



n Er=i 



v 



-ijXi 



E(e 



-ijx l 



2j>-j + l) 

j"=i 



-ijY, 



m- l J2T=i e ~ ijYl 



+ E 



2j>-i + i) 

J'=l 



e- 1 ^ E(e-^] 



+ \E(e- ijXl )\ 2 E n 



1 1 



<2^>-j + l) 



(l-e) 2 |£7(e-y y i)| 2 

e 2 |£;(e-y Xl )| 2 



+ 



< 



(i- e y 



£(p-i + i) 



(l-sfl^e-^)! 2 



n|E(e-v y i)| 2 



□ 
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